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Neiirotrvpsin 



Technical Field 



The present invention is directed to neurotrypsins and to a pharmaceutical 
composition which contains these substances or has an influence on these substances. 



Di sclosure of Invention 

Neurotrypsin is a newly discovered serine protease, which is predominantly 
expressed in the brain and in the lungs; the expression in the brain takes place nearly 
exclusively in the neurons. 

Neurotrypsin has a previously not yet found domain composition: besides the 
protease domain, there are found 3 or 4 SRCR (scavenger receptor cysteine-rich) 
domains and one Kringle domain. It is to be pointed out that the combination of Kringle 
and SRCR domains have not yet been found in proteins. At the amino terminus of the 
neurotrypsin protein there is a segment of more than 60 amino acids, which has an 
extremely high proportion of proline and basic amino acids (arginine and histidine). 

The invention is -characterized by the characteristics in the independent claims. 
Preferred embodiments are defined in the dependent claims. 

The newly Wind neurotrypsins 
— y - neurotrypsih of the human (compound of the formula I), 
Qx - neurotrypsin dfJhe mouse (compound of the formula II) 

differ structurally very much from the so far known serine proteases. 

The serine protease whose protease domain is structurally most closely related 
with the protease domain of the new compounds, namely plasmin (of the human), has 
only a 44 % amino acid sequence identity. 

The proline-rich, basic segment at the amino terminus has a certain resemblance 
with the basic segments of the netrins and the semaphorins/collapsins. Due to this 
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segment, it is probable that neurotrypsin may be enriched by means of heparin-affinity 
chromatography. 

The neurotrypsins of the human (compound of the formula I) and of the mouse 
5 (compound of the formula II) exhibit a very high structural similarity among each other. 

The identity of the aminoaplo^sequences of the native proteins of t he compound s 
of the formulas I or II amounts4o 81 %. 1 

\ Z ---y 

%>) The neurotrypsin of the human (compound of the formula I) has a coding 
sequence of 2625 nuclides. The coded peptide of the compound of the formula I has 
a length of 875 amino acids and contains a signal peptide of 20 amino acids. The 
neurotrypsin of the mouse (cVnpound of the formula II) has a coding sequence of 2283 
nucleotides. The coded proteinW the compound of the formula II has a length of 761 
15 amino acids and contains a signaWptide of 21 amino acids. The reason for the greater 
length of the neurotrypsin of the hunSan consists therein that the human neurotrypsin has 
4 SRCR domains, whereas the neurotrVpsin of the mouse has only 3 SRCR domains. 

The domains which are present in both compounds (compound of the formula I 
20 and compound of the formula II) have a high degree of sequence similarity. The 
corresponding SRCR domains of the compounds of the formulas I and II have an amino 
acid sequence identity from 81% to 91%. The corresponding Kringle domains have an 
amino acid sequence identity of 75%. A high degree of similarity consists also in the 
enzymatically active (i.e. proteolytic) domain (90% amino acid sequence identity). 




25 



The protease domains of the neurotrypsins of the human (compound of the 
formula I) and of the mouse (compound of the formula II) are aligned in the following 
section, in order to illustrate the high degree of sequence identity. 
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From the 268 amino acid sequence positions included in the comparison there are 
233 amino acids float are identical in both compounds (upper sequence: compound of 
the formula I; lowervsequence: compound of the formula II; identical amino acids are 
5 indicated by vertical lin^s). 

The inventive neurotrypsins are unique when compared with the known serine 
proteases in that they are expressed according to currently available observations in a 
distinct degree in neurons. A further organ with a strong expression of neurotrypsin are 
10 the lungs (see Gschwend et al. T Mol. Cell. Neurosci. 2, pages 207-21 9, 1 997). 
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The proteins that are structurally most similar to the compounds of the formulas I 
or II are serine proteases, such as tissue-type plasminogen activator (tPA), urokinase- 
type plasminogen activator (uPA), plasmin, trypsin, apolipoprotein (a), coagulation factor 
XI, neuropsin, and acrosin. 

5 

In the adult brain, the inventive compounds are expressed predomiantly in the 
cerebral cortex, the hippocampus, and the amygdala. 

In the adult brain stem and the spinal cord, the inventive compounds are 
10 expressed predominantly in the motor neurons. A slightly weaker expression is found in 
the neurons of the superficial layers of the dorsal horn of the spinal cord. 

In the adult peripheral nervous system, the inventive compounds are expressed in 
a subpopulation of the sensory ganglia neurons. 



15 



The inventive compounds were found in connection with a study aimed at 
discovering trypsin-like serine proteases in the nervous system. 

The first compound that was found and characterized was the compound of the 
20 formula I! (Gschwend et al., Mol. Cell. Neurosci. 2, pages 207-21 9, 1 997). 

By means of an alignment of the protease domains of 7 known serine proteases 
(tissue-type plasminogen activator, urokinase-type plasminogen activator, thrombin, 
plasmin, trypsin, chymotrypsin, and pancreatic elastase) in the proximity of the histidine 
25 and the serine of the catalytic triade of the active site, the sequences of the so-called 
primer oligonucleotides for the polymerase chain reaction were determined. 

The primer oligonucleotides were used in a polymerase chain reaction (PCR) 
together with ss-cDNA from total RNA of the brains of 10 days old mice and resulted in 
30 the amplification of a cDNA fragment of a length of approximately 500 base pairs. 

This cDNA fragment was used successfully for the isolation of further cDNA 
fragments by screening commercially available cDNA libraries. Together, the isolated 
cDNA fragments covered the full length of the coding part of the compound of the 
35 formula II. 
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By conventional DNA sequencing the complete nucleotide sequence and the 
amino acid sequence deduced therefrom was obtained. 

5 The compound of the formula I was cloned based on its pronounced similarity with 

the compound of the formula II. 

The primer oligonucleotides used were synthesized according to the known 
sequence of the compound of the formula II. 



10 



15 



20 



25 



30 



The cloning of the compound of the formula I was performed by means of two 
commercially available cDNA libraries from fetal human brain. 

This procedure for the cloning can also be used for the isolation of the homologous 
compounds of other species, such as rat, rabbit, guinea pig, cow, sheep, pig, primates, 
birds, zebra fish (Brachydanio rerio), Drosophila melanogaster, Caenorhabditis elegans 
etc. 

The coding nucleotide sequences can be used for the production of proteins with 
the coded amino acid sequences of the compounds of the formulas I or II. A procedure 
developed in our laboratory allows the production of recombinant proteins in myeloma 
cells as fusion proteins with an immunoglobulin domain (constant domain of the kappa 
light chain). The principle of the construction is given in detail by Rader et al. (Rader et 
al., Eur. J. Biochem. 215, pages 133-141, 1993). The fusion protein produced by the 
myeloma cells was isolated by immunoaffinity chromatography using a monoclonal 
antibody against the Ig domain of the kappa light chain. With the same expression 
method, also the native protein of a compound, starting from the coding sequence, can 
be produced. 

The coding sequences of the compounds of the formulas I or II can be used as 
starting compounds for the discovery and the isolation of alleles of the compounds of the 
formulas I or II. Both the polymerase chain reaction and the nucleic acid hybridization 
can be used for this purpose. 
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The coding sequences of the compounds of the formulas I or II can be used as 
starting compounds for so-called "site-directed mutagenesis", in order to generate 
nucleotide sequences coding the coded proteins that are defined by the compounds of 
the formulas I or II, or parts thereof, but whose nucleotide sequence is degenerated with 
respect to the compounds of the formulas I or II due to use of alternative codons. 

The coding sequences of the compounds of the formulas I or II can be used as 
starting compounds for the production of sequence variants by means of so-called site- 
directed mutagenesis. 
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Best Modes for Carrying out the Invention (Examples) 

cDNA cloning of the compound of the formula II (neurotrypsin of the mouse) 

5 Total RNA was isolated from the brains of 10 days old mice (ICR-ZUR) according 

to the method of Chomczynski and Sacchi (1987). The production of single stranded 
cDNA was carried out using oligo(dT) primer and a RNA-dependent DNA polymerase 
(Superscript RNase K-Reverse Transcriptase; Gibco BRL, Gaithersburg, MD) according 
to the instruction of the supplier. For the realization of the polymerase chain reaction one 

10 forward primer was synthesized based on the amino acid sequence of the region of the 
conserved histidine of the catalytic triade and one primer in the backward direction was 
synthesized based on the amino acid sequence of the region of the conserved serine of 
the catalytic triade of the serine proteases. The amino acid sequences used for the 
determination of the oligonucleotide primers were taken from seven known serine 

15 proteases. They are presented in the following. 




Protease 
domain 

PA (m) 
uPA (m) 
thrombin (m) 
plasmin (m) 
trypsin (m) 
c hymTryp b ( r ) 
pancElas II <m) 
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G G 
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G G 

G G 
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PLV. 
PLI . 
PFV. 
PLV . 
PW . 
PLV . 
PLN . 



3- c 



-TCG CTI SYI WSI CCI cVl CAT TG - 3 * (II) 3 ' - ACR DTY CCI CTR WSX CCI CC-5' 



The protease domains of 7 known serine proteases (tissue-type plasminogen 
activator, urokinase-type plasminogen activator, thrombin, piasmin, trypsin, 
chymotrypsin, and pancreatic elastase) were aligned in the region of the conserved 
20 histidine and serine of the catalytic triade of the active site. The conserved amino acids 
of these regions were taken as the basis for the determination of the degenerated 
primers. The primer sequences are given according to the recommendation of the IUB 
nomenclature (Nomenclature Committee 1 985). 

25 The primers used in the PCR contained restriction sites for EcoRI and BamH\ at 

their 5* ends in order to facilitate a subsequent cloning. 
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The following primers were used: 




10 



In the reading direction (sense primers): 

5 , -GGGGAA^CTGGGTI(C/G)(T/C)l(T/A)(G/C)lGCIGCICA(T/C)TG-3' 

In the counter direction (antisense primers): 

5 , -GGGGGATCCCCICCI(G/C)(AyT)(A/G)TCICC(C/T)T(G/CAT)(G/A)CA-3'. 



The polymerase chain reaction was carried out under standard conditions using 
the DNA polymerase AmpliTaq (Perkin Elmer) according to the recommendations of the 
producer. The following PCR profile was employed: 93°C for 3 minutes, followed by 35 
cycles of 93°C for 1 minute, 48°C for 2 minutes, and 72°C for 2 minutes. Following the 
last cycle, the incubation was continued at 72°C for further 10 minutes. 



The amplified fragments had an approximate length of 500 base pairs. They were 
15 cut with EcoRI and BamH\ and inserted in a Blue Script vector (Bluescript SK(-), 
Stratagene). The resulting clones were analyzed by DNA sequence determination using 
the dideoxy chain termination method (Sanger et aL, Proc. Natl. Acad. Sci. USA 77, 
pages 2163-2167, 1977) on an automated DNA sequencer (LI-COR, model 4000L; 
Lincoln, NE) using a commercial sequencing kit (SequiTerm long-read cycle sequencing 
20 kit-LC; Epicentre Technologies, Madison, Wl). The analysis yielded a sequence of 474 
base pairs of the catalytic region of the serine protease domain of the compound of the 
formula II. 



The 474 base pair long PCR fragment was used for screening of an oligo(dT)- 
25 primed Uni-ZAP-XR cDNA library from the brain of 20 days old mice (Stratagene; cat. 
no. 937 319). At total of 3 x 10 6 lambda plaques were screened under high stringent 
conditions (Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring 
Harbor Laboratory Press, 1 989) using a radioactively labeled PCR fragment as a probe 
and 24 positive clones were found. 

30 

From the positive Lambda-Uni-ZAP-XR phagemid clones the corresponding 
Bluescript plasmid was cut out by in vivo excision according to a standard method 
recommended by the producer (Stratagene). In order to determine the length of the 
inserted fragments the corresponding Bluescript plasmid clones were digested with Sad 
35 and KpnI. The clones containing the longest fragments were analyzed by DNA 
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sequencing (as described above) and for subsequent data analysis the GCG software 
(version 8.1, Unix; Silicon Graphics, Inc.) was used. 

Because none of the clones contained the coding sequence in full length, a second 
5 cDNA library was screened. The library used in this screen was an oligo(dT)- and 
random-primed cDNA library in a Lambda phage (Lambda gt10) which was based on 
mRNA from 15 days old mouse embryos (oligo(dT)- and random-primed Lambda gt10 
cDNA library; Clontech, Palo Alto, CA; cat. no. ML 3002a). As a probe a radioactively 
labeled DNA fragment (Aval/Aatll) from the 5' end of the longest clone of the first screen 
10 was used and approximately 2x1 0 6 plaques were screened. This screen resulted in 14 
positive clones. The cDNA fragments were excised with EcoR\ and cloned into the 
Bluecript vector (KS(+); Stratagene). The sequence analysis was carried out as 
described above. 

15 In this way the nucleotide sequence over the full length cDNA of 2361 and 2376 

base pairs, respectively, of the compound of the formula II was obtained. With the 
described procedure of PGR cloning it is possible to find and isolate also variant forms of 
the compounds of the formulas I or II, as for example their alleles or their splice variants. 
The described method of screening of a cDNA library allows also the detection and the 

20 isolation of compounds which hybridize under stringent conditions with the coding 
sequences of the compounds of the formulas I or II. 
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Cloning of th*» r?f)N A of the compound o f the formula I (neurotrypsin of the human) 

The cloning of the cDNA of the compound of the formula I was carried out basing 
on the nucleotide sequence of the compound of the formula II. As a first step, a fragment 
of the compound of the formula I was amplified using the polymerase chain reaction 
(PCR). As a matrix we used the DNA obtained from a cDNA library from the brain of a 
human fetus (I? 1 " - 18 th week of pregnancy) which is commercially available (Oligo(dT)- 
and random-primed, human fetal brain cDNA library in the Lambda ZAP II vector, cat. 
no. 936206, Stratagene). The synthetic PCR primers contained restriction sites for 
HindU and X/70I at the 5' end in order to facilitate the subsequent cloning. 

In the reading direction (sense primers): 

5 , -GGGAAGc\n-GGICA(A/G)TGGGGIACI(A/G)TITG(C^T)GA(C/T)-3' 

In the counter direction (antisense primers): 

5'-GGGCTCGAGOCCCAICCTGTTATGTAAIAGTTG-3' 



The PCR was carried out under standard conditions using the DNA polymerase 
Amplitaq (Perkin Elmer) according to the recommendations of the producer. The 
resulting fragment of 1116 base pairs was inserted into the Bluescript vector (Bluescript 
SK(-), Stratagene). A 600 base pairs long H/ndlll/Sful fragment, corresponding to the 5' 
half the 1116 base pairs long PCR fragment, was used for the screening of a Lamda 
cDNA library from human fetal brain (Human Fetal Brain 5'-STRETCH PLUS cDNA 
library; Lambda gt10; cat. no. HL 3003 a; Clontech). 2x10 6 Lambda plaques were 
screened under high stringent conditions (Sambrook et al., Molecular Cloning: A 
laboratory manual, Cold Spring Harbor Laboratory Press, 1989) by means of a 
radioactively labeled PCR fragment, and 23 positive clones were found and isolated. 

From the positive Lambda gt10 clones the corresponding cDNA fragments were 
excised with EcoRI and inserted into a Bluescript vector (Bluescript KS(+), Stratagene). 
The sequencing was carried out by means of the dideoxy chain termination method 
(Sanger et al., Proc. Natl. Acad. Sci. USA 71, pages 2163-2167, 1977), using a 
commercial sequencing kit (SequiTherm long-read cycle sequencing kit-LC; Epicentre 
Technologies, Madison, Wl) and Bluescript-specific primers. 
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In an alternative sequencing strategy, the cDNA fragments of the positive Lambda 
gt10 clones were PCR amplified using Lambda-specific primers. The sequencing was 
carried out as described above. 

5 

The computerized analysis of the sequences was performed by means of the 
program package GCG (version 8.1 , Unix; Silicon Graphics Inc.). 

In this way the nucleotide sequence over the full length of the cDNA of 3350 base 
10 pairs was obtained. With the described procedure for PCR cloning it is possible to find 
and to isolate also variant forms of the compounds of the formulas I or II, as for example 
their alleles or their splice variants. The described procedure for the screening of a 
cDNA library allows also the discovery and the isolation of compounds which hybridize 
under stringent conditions with the coding sequences of the compounds of the formulas I 
15 or II. 
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Visualization of the coded seauenr.Bs of t h fi compounds of the formulas | or II by 
means of antibodies 



'The mote than 60 amino acids long proiine-rich, basic segment at the amino 



^tenriinus of the\oded sequence of the compounds of the formulas I or II is well suited 
for the production^ antibodies by means of synthesizing peptides and using them for 
immunization. We have selected two peptide sequences with a length of 19 and 13 
amino acids from the\roline-rich, basic segment at the amino terminus of the coded 
10 sequence of the compound of the formula II for the generation of antibodies. The 
peptides had the following sequences: 
Peptide 1 : H 2 N-SRS PLH RRH PSP PRS QX-CONH, 
Peptide 2: H 2 N-LPS SRR PPRVPR F-COOH 

15 The two peptides were synthesized chemically, coupled to a macromolecular 

carrier (Keyhole Limpet Hemacyanin), and injected into 2 rabbits for immunization. The 
resulting antisera exhibit a high antibody titer and could successfully be used both for the 
identification of native neurotrypsin in brain extract of the mouse and for the identification 
of recombinant neurotrypsin. The employed procedure for the generation of antibodies 

20 can also be used for the generation of antibodies against the coded sequence of the 
compound of the formula I. 

The resulting antibodies against the partial sequences of the coded sequences of 
the compounds of the formulas I or II can be used for the detection and the isolation of 
25 variant forms of the compounds of the formulas I or II, as for example alleles or splice 
variants. Such antibodies can also be used for the detection and isolation of gene 
technologically generated variants of the compounds of the formulas I or II. 
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Purification of the coded sequ e nt of the compounds of the formulas I or II 

Besides conventional chromatographic methods, as for example ion exchange 
5 chromatography, the purification of the coded sequences of the compounds of the 
formulas I or II can also be achieved using two affinity chromatographic purification 
procedures. One affinity chromatographic purification procedure is based on the 
availability of antibodies. By coupling the antibodies on a chromatographic matrix, a 
purification procedure results, in which a very high degree of purity of the corresponding 
10 compound can be achieved in one step. 

Another important feature that can be used for the purification of the coded 
sequences of the compounds of the formulas I or II is the proline-rich, basic segment at 
the amino terminus. It may be expected that, due to the high density of positive charges, 

15 this segment mediates the binding of the coded sequences of the compounds of the 
formulas I or II to heparin and heparin-like affinity matrices. This principle allows also the 
isolation, or at least the enrichment, of variant forms of the coded sequences of the 
compounds of the formulas I or II, as for example their alleles or splice variants. Likewise 
the heparin affinity chromatography can be used for the isolation, or at least the 

20 enrichment, of species-homologous proteins of the compounds of the formulas I or II. 
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Industrial Applicability 

The coding sequences of the formulas I and II can be used for the production of 
the coded proteins or parts thereof of the formulas I and II. The production of the coded 
5 proteins can be achieved in procaryotic or eucaryotic expression systems. 

The gene expression pattern of the inventive compounds in the brain is extremely 
^ interesting, because these molecules are expressed in the adult nervous system 
predominantly in neurons of those regions that are thought to play an important role in 

10 learning and memory functions. Together with the recently found evidence for a role of 
extracellular proteases in neural plasticity, the expression pattern allows the assumption 
that the proteolytic activity of neurotrypsin has a role in structural reorganizations in 
connection with learning and memory operations, for example operations which are 
involved in the processing and storage of learned behaviors, learned emotions, or 

15 memory contents. The inventive compounds may, thus, represent a target for 
pharmaceutical intervention in malfunctions of the brain. 

The gene expression pattern of the inventive compounds in the cerebral cortex 
(especially layers V and VI) is extremely interesting, because a reduction of the cellular 
20 differentiation in the cerebral cortex has been found to be associated with schizophrenia. 
The inventive compounds may, thus, be a target for pharmaceutical intervention in 
schizophrenia and related psychiatric diseases. 

The coding sequences of the inventive compounds have been found to be ^ 
25 increased in the neurons located adjacent to the damaged tissue of a focal ischemic + 
stroke, indicating that the inventive compounds play a role in the tissue reaction in the 
injured cerebral tissue. The inventive compounds may, thus, represent a target for 
pharmaceutical intervention after ischemic stroke and other forms of neural tissue 
damage. 



30 



Tissue-type plasminogen activator, a serine protease related to the inventive 
compounds, has recently been found to be involved in excitotoxicity-mediated neuronal 
cell death. A similar function is conceivable for the inventive compounds and. thus, the 
inventive compounds represent a possible target for a pharmacological intervention in 
35 diseases in which cell death occurs. 
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The gene expression pattern of the inventive compounds in the spinal cord and in 
the sensory ganglia is interesting, because these molecules are expressed in the adult 
nervous system in neurons of those brain regions that are thought to play a role in the 
processing of pain, as well as in the pathogenesis of pathological pain. The inventive 
compounds may, thus, be a target for pharmaceutical intervention in pathological pain. 



In the following part statements concerning the compounds of the formulas I or II 
are given: 
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(l\ INFORMATION ARni IT THE C O MP OUND OF THF FORMULA I 
\(Neurotrvpsin r>t the human) 

(i) SEQUENCE CHARACTERISTICS: 

\ 

\ 

(A) LENGTH\3350 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDN£SS: single strand 

(D) TOPOLOGY: liryear 

(ii) MOLECULE TYPE:\DNA to mRNA 
(vi) ORIGINAL SOURCE: 



15 (A) ORGANISM: Homo sapiens 

(D) DEVELOPMENT STAGE: fetalN 
(F) TISSUE TYPE: brain 



10 



20 



(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: human fetal brain ^"stretch plus cDNA library in the lambda 

gt1 0 vector; catalog No.ldL 3003a; Clontech, Palo Alto, CA, USA. 



(B) CLONE: cDNA Clone No.: 
25 3-1 . 3-2, 3-6, 3-7, 3-8, 3-10, 3-1 1,3-12 



(ix) FEATURE: 

30 (A) NAME/KEY: Signal peptide 
(B) LOCATION: 44 .. 103 
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(ix) 



FEATURE: 



(A) 
(B) 



NAME/KEY: mature peptide 
LOCATION: 104 .. 2668 



5 



(ix) FEATURE: 

(A) NAME/KEY: coding sequence 
10 (B) LOCATION: 44 .. 2668 



(ix) FEATURE: 

15 (A) NAME/KEY: Proline-rich, basic segment 

(B) LOCATION: 104 .. 319 

(ix) FEATURE: 

20 

(A) NAME/KEY: Kringle domain 

(B) LOCATION: 320 .. 538 



25 (ix) FEATURE: 

(A) NAME/KEY: SRCR domain 1 

(B) LOCATION: 551 .. 856 



(ix) FEATURE: 

(A) NAME/KEY: SRCR domain 2 

(B) LOCATION: 881 ..1186 

35 



30 
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(ix) FEATURE: 

(A) NAME/KEY: SRCR domain 3 
5 (B) LOCATION: 1202 .. 1504 



(ix) FEATURE: 

10 (A) NAME/KEY: SRCR domain 4 
(B) LOCATION: 1541 .. 1846 



(ix) FEATURE: 

15 

(A) NAME/KEY: proteolytic domain 

(B) LOCATION: 1898 .. 2668 



20 (ix) FEATURE: 

(A) NAME/KEY: histidine of the catalytic triade 

(B) LOCATION: 2069 - 2071 

25 

(ix) FEATURE: 

(A) NAME/KEY: aspartic acid of the catalytic triade 

(B) LOCATION: 221 9-2221 

30 

(ix) FEATURE: 

(A) NAME/KEY: serine of the catalytic triade 

35 (B) LOCATION: 2516 .. 2518 
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(ix) FEATURE: 

(A) NAME/KEY: polyA signal 

(B) LOCATION: 2873 .. 2878 

(ix) FEATURE 

(A) NAME/KEY: polyA signal 

(B) LOCATION: 3034 .. 3039 

(ix) FEATURE: 

(A) NAME/KEY: polyA signal 

(B) LOCATION: 3215 .. 3220 

(ix) FEATURE: 

(A) NAME/KEY: 3'UTR 

(B) LOCATION: 2669 .. 3350 

(ix) FEATURE 

(A) NAME/KEY: 5'UTR 

(B) LOCATION: 1 .. 43 
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Compound of the formula I (neurotrypsin of the human) 



CGGAAGCTGG GGAGCATGGA CCAGACCCCG CAGCGCTGGC ACC ATG ACG CTC GCC 5 5 

Met Thr Leu Ala 
-20 

CGC TTC GTG CTA GCC CTG ATG TTA GGG GCG CTC CCC GAA GTG GTC GGC 103 
Arg Phe Val Leu Ala Leu Met Leu Gly Ala Leu Pro Glu Val Val Gly 
-15 -10 -5 -1 

TTT GAT TCT GTC CTC AAT GAT TCC CTC CAC CAC AGC CAC CGC CAT TCG 151 
Phe Asp Ser Val Leu Asn Asp Ser Leu His His Ser His Arg His Ser 
15 10 15 

CCC CCT GCG GGT CCG CAC TAC CCC TAT TAC CTT CCC ACC CAG CAG CGG 19 9 

Pro Pro Ala Gly Pro His Tyr Pro Tyr Tyr Leu Pro Thr Gin Gin Arg 
20 25 30 

CCC CCG ACG ACG CGT CCG CCG CCG CCT CTC CCG CGC TTC CCG CGC CCC 247 
Pro Pro Thr Thr Arg Pro Pro Pro Pro Leu Pro Arg Phe Pro Arg Pro 
35 40 45 

CCG CGG GCG CTC CCT GCC CAG CGC CCG CAC GCC CTC CAG GCC GGG CAC 29 5 

Pro Arg Ala Leu Pro Ala Gin Arg Pro His Ala Leu Gin Ala Gly His 
50 55 60 

ACG CCC CGG CCG CAC CCC TGG GGC TGC CCC GCC GGC GAG CCA TGG GTC 343 
Thr Pro Arg Pro His Pro Trp Gly Cys Pro Ala Gly Glu Pro Trp Val 
65 70 75 80 

AGC GTG ACG GAC TTC GGC GCC CCG TGT CTG CGG TGG GCG GAG GTG CCA 391 
Ser Val Thr Asp Phe Gly Ala Pro Cys Leu Arg Trp Ala Glu Val Pro 
85 90 95 

CCC TTC CTG GAG CGG TCG CCC CCA GCG AGC TGG GCT CAG CTG CGA GGA 43 9 

Pro Phe Leu Glu Arg Ser Pro Pro Ala Ser Trp Ala Gin Leu Arg Gly 
100 105 110 

CAG CGC CAC AAC TTT TGT CGG AGC CCC GAC GGC GCG GGC AGA CCC TGG 487 
Gin Arg His Asn Phe Cys Arg Ser Pro Asp Gly Ala Gly Arg Pro Trp 
115 120 125 

TGT TTC TAC GGA GAC GCC CGT GGC AAG GTG GAC TGG GGC TAC TGC GAC 53 5 

Cys Phe Tyr Gly Asp Ala Arg Gly Lys Val Asp Trp Gly Tyr Cys Asp 
130 135 140 

TGC AGA CAC GGA TCA GTA CGA CTT CGT GGC GGC AAA AAT GAG TTT GAA 583 
Cys Arg His Gly Ser Val Arg Leu Arg Gly Gly Lys Asn Glu Phe Glu 
145 150 155 160 

GGC ACA GTG GAA GTA TAT GCA AGT GGA GTT TGG GGC ACT GTC TGT AGC 631 
Gly Thr Val Glu Val Tyr Ala Ser Gly Val Trp Gly Thr Val Cys Ser 
165 170 175 

AGC CAC TGG GAT GAT TCT GAT GCA TCA GTC ATT TGT CAC CAG CTG CAG 67 9 

Ser His Trp Asp Asp Ser Asp Ala Ser Val He Cys His Gin Leu Gin 
180 185 190 
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CTG GGA GGA AAA GGA ATA GCA AAA CAA ACC CCG TTT TCT GGA CTG GGC 7 27 

Leu Gly Gly Lys Gly lie Ala Lys Gin Thr Pro Phe Ser Gly Leu Gly 
195 200 205 

CTT ATT CCC ATT TAT TGG AGC AAT GTC CGT TGC CGA GGA GAT GAA GAA 77 5 

Leu He Pro He Tyr Trp Ser Asn Val Arg Cys Arg Gly Asp Glu Glu 
210 215 220 

AAT ATA CTG CTT TGT GAA AAA GAC ATC TGG CAG GGT GGG GTG TGT CCT 82 3 

Asn He Leu Leu Cys Glu Lys Asp He Trp Gin Gly Gly Val Cys Pro 
225 230 235 240 

CAG AAG ATG GCA GCT GCT GTC ACG TGT AGC TTT TCC CAT GGC CCA ACG 871 
Gin Lys Met Ala Ala Ala Val Thr Cys Ser Phe Ser His Gly Pro Thr 
245 250 255 

TTC CCC ATC ATT CGC CTT GCT GGA GGC AGC AGT GTG CAT GAA GGC CGG 919 
Phe Pro He He Arg Leu Ala Gly Gly Ser Ser Val His Glu Gly Arg 
260 265 270 

GTG GAG CTC TAC CAT GCT GGC CAG TGG GGA ACC GTT TGT GAT GAC CAA 967 
Val Glu Leu Tyr His Ala Gly Gin Trp Gly Thr Val Cys Asp Asp Gin 
275 280 285 

TGG GAT GAT GCC GAT GCA GAA GTG ATC TGC AGG CAG CTG GGC CTC AGT 1015 
Trp Asp Asp Ala Asp Ala Glu Val He Cys Arg Gin Leu Gly Leu Ser 
290 295 300. 

GGC ATT GCC AAA GCA TGG CAT CAG GCA TAT TTT GGG GAA GGG TCT GGC 1063 
Gly He Ala Lys Ala Trp His Gin Ala Tyr Phe Gly Glu Gly Ser Gly 
305 310 315 320 

CCA GTT ATG TTG GAT GAA GTA CGC TGC ACT GGG AAT GAG CTT TCA ATT 1111 
Pro Val Met Leu Asp Glu Val Arg Cys Thr Gly Asn Glu Leu Ser He 
325 - 330 335 

GAG CAG TGT CCA AAG AGC TCC TGG GGA GAG CAT AAC TGT GGC CAT AAA 1159 
Glu Gin Cys Pro Lys Ser Ser Trp Gly Glu His Asn Cys Gly His Lys 
340 345 350 

GAA GAT GCT GGA GTG TCC TGT ACC CCT CTA ACA GAT GGG GTC ATC AGA 1207 
Glu Asp Ala Gly Val Ser Cys Thr Pro Leu Thr Asp Gly Val He Arg 
355 360 365 

CTT GCA GGT GGG AAA GGC AGC CAT GAG GGT CGC TTG GAG GTA TAT TAC 12 55 

Leu Ala Gly Gly Lys Gly Ser His Glu Gly Arg Leu Glu Val Tyr Tyr 
370 375 380 

AGA GGC CAG TGG GGA ACT GTC TGT GAT GAT GGC TGG ACT GAG CTG AAT 13 03 

Arg Gly Gin Trp Gly Thr Val Cys Asp Asp Gly Trp Thr Glu Leu Asn 
385 390 395 400 

ACA TAC GTG GTT TGT CGA CAG TTG GGA TTT AAA TAT GGT AAA CAA GCA 13 51 
Thr Tyr Val Val Cys Arg Gin Leu Gly Phe Lys Tyr Gly Lys Gin Ala 
405 410 415 

TCT GCC AAC CAT TTT GAA GAA AGC ACA GGG CCC ATA TGG TTG GAT GAC 13 9 9 
Ser Ala Asn His Phe Glu Glu Ser Thr Gly Pro He Trp Leu Asp Asp 
420 425 430 
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GTC AGC TGC TCA GGA AAG GAA ACC AGA TTT CTT CAG TGT TCC AGG CGA 1447 
Val Ser Cys Ser Gly Lys Glu Thr Arg Phe Leu Gin Cys Ser Arg Arg 
435 440 445 

CAG TGG GGA AGG CAT GAC TGC AGC CAC CGC GAA GAT GTT AGC ATT GCC 14 9 5 

Gin Trp Gly Arg His Asp Cys Ser His Arg Glu Asp Val Ser lie Ala 
450 455 460 

TGC TAC CCT GGC GGC GAG GGA CAC AGG CTC TCT CTG GGT TTT CCT GTC 154 3 

Cys Tyr Pro Gly Gly Glu Gly His Arg Leu Ser Leu Gly Phe Pro Val 
465 470 475 480 

AGA CTG ATG GAT GGA GAA AAT AAG AAA GAA GGA CGA GTG GAG GTT TTT 1591 
Arg Leu Met Asp Gly Glu Asn Lys Lys Glu Gly Arg Val Glu Val Phe 
485 490 495 

ATC AAT GGC CAG TGG GGA ACA ATC TGT GAT GAT GGA TGG ACT GAT AAG 163 9 

lie Asn Gly Gin Trp Gly Thr He Cys Asp Asp Gly Trp Thr Asp Lys 
500 505 510 

GAT GCA GCT GTG ATC TGT CGT CAG CTT GGC TAC AAG GGT CCT GCC AGA 16 87 

Asp Ala Ala Val He Cys Arg Gin Leu Gly Tyr Lys Gly Pro Ala Arg 
515 520 525 

GCA AGA ACC ATG GCT TAC TTT GGA GAA GGA AAA GGA CCC ATC CAT GTG 17 3 5 

Ala Arg Thr Met Ala Tyr Phe Gly Glu Gly Lys Gly Pro He His Val 
530 535 540 

GAT AAT GTG AAG TGC ACA GGA AAT GAG AGG TCC TTG GCT GAC TGT ATC 17 8 3 

Asp Asn Val Lys Cys Thr Gly Asn Glu Arg Ser Leu Ala Asp Cys lie 
545 550 555 560 

AAG CAA GAT ATT GGA AGA CAC AAC TGC CGC CAC AGT GAA GAT GCA GGA 1831 
Lys Gin Asp He Gly Arg His Asn Cys Arg His Ser Glu Asp Ala Gly 
565 570 575 

GTT ATT TGT GAT TAT TTT GGC AAG AAG GCC TCA GGT AAC AGT AAT AAA 187 9 

Val He Cys Asp Tyr Phe Gly Lys Lys Ala Ser Gly Asn Ser Asn Lys 
580 585 590 

GAG TCC CTC TCA TCT GTT TGT GGC TTG AGA TTA CTG CAC CGT CGG CAG 1927 
Glu Ser Leu Ser Ser Val Cys Gly Leu Arg Leu Leu His Arg Arg Gin 
595 600 605 

AAG CGG ATC ATT GGT GGG AAA AAT TCT TTA AGG GGT GGT TGG CCT TGG 197 5 

Lys Arg He He Gly Gly Lys Asn Ser Leu Arg Gly Gly Trp Pro Trp 
610 615 620 

CAG GTT TCC CTC CGG CTG AAG TCA TCC CAT GGA GAT GGC AGG CTC CTC 2 02 3 

Gin Val Ser Leu Arg Leu Lys Ser Ser His Gly Asp Gly Arg Leu Leu 
625 630 635 640 

TGC GGG GCT ACG CTC CTG AGT AGC TGC TGG GTC CTC ACA GCA GCA CAC 2071 
Cys Gly Ala Thr Leu Leu Ser Ser Cys Trp Val Leu Thr Ala Ala His 
645 650 655 

TGT TTC AAG AGG TAT GGC AAC AGC ACT AGG AGC TAT GCT GTT AGG GTT 2119 
Cys Phe Lys Arg Tyr Gly Asn Ser Thr Arg Ser Tyr Ala Val Arg Val 
660 665 670 
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GGA GAT TAT CAT ACT CTG GTA CCA GAG GAG TTT GAG GAA GAA ATT GGA 2167 
Gly Asp Tyr His Thr Leu Val Pro Glu Glu Phe Glu Glu Glu He Gly 
675 680 685 

GTT CAA CAG ATT GTG ATT CAT CGG GAG TAT CGA CCC GAC CGC AGT GAT 2 215 

Val Gin Gin He Val He His Arg Glu Tyr Arg Pro Asp Arg Ser Asp 
690 695 700 

TAT GAC ATA GCC CTG GTT AGA TTA CAA GGA CCA GAA GAG CAA TGT GCC 22 6 3 

Tyr Asp He Ala Leu Val Arg Leu Gin Gly Pro Glu Glu Gin Cys Ala 
705 710 715 720 

AGA TTC AGC AGC CAT GTT TTG CCA GCC TGT TTA CCA CTC TGG AGA GAG 2311 
Ara Phe Ser Ser His Val Leu Pro Ala Cys Leu Pro Leu Trp Arg Glu 
725 730 735 

AGG CCA CAG AAA AC A GCA TCC AAC TGT TAC ATA ACA GGA TGG GGT GAC 23 59 

Arg Pro Gin Lys Thr Ala Ser Asn Cys Tyr He Thr Gly Trp Gly Asp 
740 745 750 

ACA GGA CGA GCC TAT TCA AGA ACA CTA CAA CAA GCA GCC ATT CCC TTA 2407 
Thr Gly Arg Ala Tyr Ser Arg Thr Leu Gin Gin Ala Ala He Pro Leu 
755 760 765 

CTT CCT AAA AGG TTT TGT GAA GAA CGT TAT AAG GGT CGG TTT ACA GGG 2 45 5 

Leu Pro Lys Arg Phe Cys Glu Glu Arg Tyr Lys Gly Arg Phe Thr Gly 
770 775 780 

AGA ATG CTT TGT GCT GGA AAC CTC CAT GAA CAC AAA CGC GTG GAC AGC 2 503 

Arg Met Leu Cys Ala Gly Asn Leu His Glu His Lys Arg Val Asp Ser 
785 790 795 800 

TGC CAG GGA GAC AGC GGA GGA CCA CTC ATG TGT GAA CGG CCC GGA GAG 2551 
Cys Gin Gly Asp Ser Gly Gly Pro Leu Met Cys Glu Arg Pro Gly Glu 
805 810 815 

AGC TGG GTG GTG TAT GGG GTG ACC TCC TGG GGG TAT GGC TGT GGA GTC 2 59 9 

Ser Trp Val Val Tyr Gly Val Thr Ser Trp Gly Tyr Gly Cys Gly Val 
820 825 830 

AAG GAT TCT CCT GGT GTT TAT ACC AAA GTC TCA GCC TTT GTA CCT TGG 2647 
Lys Asp Ser Pro Gly Val Tyr Thr Lys Val Ser Ala Phe Val Pro Trp 
835 840 845 

ATA AAA AGT GTC ACC AAA CTG TAA TTCTTCATGG AAAC TTC AAA GCAGCATTT 2700 
He Lys Ser Val Thr Lys Leu * 
850 855 

AAACAAATGG AAAACTTTGA ACCCCCACTA TTAGCACTCA GCAGAGATGA CAACAAATGG 27 60 

CAAGATCTGT TTTTGCTTTG TGTTGTGGTA AAAAATTGTG TACCCCCTGC TGCTTTTGAG 2 820 

AAATTTGTGA ACATTTTCAG AGGCCTCAGT GTAGTGGAAG TGATAATCCT TAAATGAACA 2 8 80 

TTTTCTACCC TAATTTCACT GGAGTGACTT ATTCTAAGCC TCATCTATCC CCTACCTATT 2 940 
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TCTCAAAATC ATTCTATGCT GATTTTACAA AAGATCATTT TTACATTTGA ACTGAGAACC 3 000 

CCTTTTAATT GAATCAGTGG TGTCTGAAAT CATATTAAAT ACCCACATTT GACATAAATG 3 060 

CGGTACCCTT TACTACACTC ATGAGTGGCA TATTTATGCT TAGGTCTTTT CAAAAGACTT 312 0 

GACAAGAAAT CTTCATATTC TCTGTAGCCT TTGTCAAGTG AGGAAATCAG TGGTTAAAGA 3180 

ATTCCACTAT AAACTTTTAG GCCTGAATAG GAGTAGTAAA GCCTCAAGGA CATCTGCCTG 3240 

TCACAATATA TTCTCAAAGT GATCTGATAT TTGGAAACAA GTATCCTTGT TGAGTACCAA 3 3 00 

GTGCTACAGA AACCATAAGA TAAAAATACT TTCTACCTAC AGCGTGCCCG 3 3 50 
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(!) INFORMATION AROI IT THE COMPOUND OF THF FO RMULA II (Negrotrypsin 
of the moused 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2376 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single strand 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Mus musculus 

(D) DEVELOPMENT STAGE: postnatal day 10 

(F) TISSUE TYPE: brain 

(G) CELL TYPE: neurons 

(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: mouse brain cDNA library in the lambda Uni-ZAP-XR vector, oligo 

(dT)-primed, from Balb c mice, postnatal day 20, 
Cat. No.. 937 319; Stratagene, La Jolla, CA, USA 



(B) CLONE: cDNA clone no. 16 
(vii) IMMEDIATE SOURCE: 

(A) LIBRARY: mouse brain cDNA library in the Lambda gt1 0 vector, 

oligo(dT)- and random-primed, embryonic day 15, 
Cat. No. ML 3002a; Clontech, Palo Alto, CA, USA 

(B) CLONE: cDNA clone #25 
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(ix) FEATURE: 

(A) NAME/KEY: signal peptide 
5 (B) LOCATION: 24 .. 86 

(ix) FEATURE: 

10 (A) NAME/KEY: mature peptide 

(B) LOCATION: 87 .. 2306 

(ix) FEATURE: 

15 

(A) NAME/KEY: coding sequence 

(B) LOCATION: 24 .. 2306 

20 (ix) FEATURE: 

(A) NAME/KEY: proline-rich, basic segment 

(B) LOCATION: 90 .. 275 



(ix) FEATURE: 

(A) NAME/KEY: Kringle domain 

(B) LOCATION: 276 .. 494 

30 

(ix) FEATURE: 

(A) NAME/KEY: SRCR domain 1 

35 (B) LOCATION: 519 .. 824 



25 



SUBSTITUTE SHEET (RULE 26) 



WO 98/49322 





PCT/1B98/00625 



-27- 



(ix) FEATURE: 

5 (A) NAME/KEY: SRCR domain 2 
(B) LOCATION: 840 .. 1 142 



(ix) FEATURE: 

10 

(A) NAME/KEY: SRCR domain 3 

(B) LOCATION: 1 179 .. 1484 



15 (ix) FEATURE: 

(A) NAME/KEY: proteolytic domain 

(B) LOCATION: 1536.. 2306 



(ix) FEATURE: 

(A) NAME/KEY: histidine of the catalytic triade 

(B) LOCATION: 1707 .. 1709 

25 

(ix) FEATURE: 

(A) NAME/KEY: aspartic acid of the catalytic triade 

30 (B) LOCATION: 1857 .. 1859 

(ix) FEATURE: 

35 (A) NAME/KEY: serine of the catalytic triade 



20 
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(B) LOCATION: 21 54 .. 21 56 
(ix) FEATURE: 

(A) NAME/KEY:polyA signal 

(B) LOCATION: 2324 .. 2329 and 2331 .. 2336 

(ix) FEATURE: 

(A) NAME/KEY: polyA segment 

(B) LOCATION: 2357 .. 2376 

(ix) FEATURE: 

(A) NAME/KEY: 3'UTR 

(B) LOCATION: 2307 .. 2341 or 2307 .. 2356 

(ix) FEATURE: 

(A) NAME/KEY: 5'UTR 

(B) LOCATION: 1 .. 23 
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Coumpound of the formula II (neurotrypsin of the mouse) 



GGACCACACT CGGCGCCGCA GCC ATG GCG CTC GCC CGC TGC GTG CTG GCT GTG 

Met Ala Leu Ala Arg Cys Val Leu Ala val 
-20 "IS 

ATT TTA GGG GCA CTG TCT GTA GTG GCC CGC GCT GAT CCG GTC TCG CGC 
lie Leu Gly Ala Leu Ser Val Val Ala Arg Ala Asp Pro Val Ser Arg 
-10 ~ 5 

TCT CCC CTT CAC CGC CCG CAT CCG TCC CCA CCG CGT TCC CAA CAC GCG 
Ser Pro Leu His Arg Pro His Pro Ser Pro Pro Arg Ser Gin His Ala 
10 15 20 

CAC TAG CTT CCC AGC TCG CGG CGG CCA CCC AGG ACC CCG CGC TTC CCG 
Sis £r llu Pro Ser Ser Arg Arg Pro Pro Arg Thr Pro Arg Phe Pro 
25 30 35 

CTC CCG CTG CGG ATC CCC GCT GCC CAG CGC CCG CAG GTC CTC AGC ACC 
Leu Pro Leu Arg He Pro Ala Ala Gin Arg Pro Gin Val Leu Ser Thr 
40 45 50 

GGG CAC ACG CCC CCG ACG ATT CCA CGC CGC TGC GGG GCA GGA GAG TCG 
Gly His Thr Pro Pro Thr He Pro Arg Arg Cys Gly Ala Gly Glu Ser 
55 60 65 

TGG GGC AAT GCC ACC AAC CTC GGC GTC CCG TGT CTA CAC TGG GAC GAG 
Trp Gly Asn Ala Thr Asn Leu Gly Val Pro Cys Leu His Trp Asp Glu 
70 75 80 

GTG CCG CCC TTC CTG GAG CGG TCG CCC CCG GCC ACT TGG GCT GAG CTG 
Zl Pro Pro ™ Leu Glu Arg Ser Pro Pro Ala Ser Trp Ala Glu Leu 
90 95 100 

CGA GGG CAG CCG CAC A*C TTC TGC CGG AGC CCG GAT GGC TCG GGC AGA 
Arg Gly Gin Pro His Asn Phe Cys Arg Ser Pro Asp Gly Ser Gly Arg 
105 HO H5 

CCT TGG TGC TTC TAT CGG AAT GCC CAG GGC AAA GTA GAC TGG GGC TAC 
Pro Trp Cys Phe Tyr Arg Asn Ala Gin Gly Lys Val Asp Trp Gly Tyr 
120 I 25 130 

TGC GAT TGT GGT CAA GGC CCG GCG TTG CCC GTC ATT CGC CTT GTT GGT 
Cys Asp Cys Gly Gin Gly Pro Ala Leu Pro Val lie Arg Leu Val Gly 
135 140 14b 

GGG AAC AGT GGG CAT GAA GGT CGA GTG GAG CTG TAC CAC GCT GGC CAG 
Gly Asn Ser Gly His Glu Gly Arg Val Glu Leu Tyr His Ala Gly Gin 
150 155 160 

TGG GGG ACC ATC TGT GAC GAC CAA TGG GAC AAT GCA GAC GCA GAC GTC 
Trp Gly Thr lie Cys Asp Asp Gin Trp Asp Asn Ala Asp Ala Asp Val 

ATC TGT AGG CAG CTG GGG CTC AGT GGC ATT GCC AAA GCA TGG CAT CAG 
111 ITs Art Git Leu Gly Leu Ser Gly He Ala Lys Ala Trp His Gin 



53 



101 



149 



197 



245 



293 



341 



389 



437 



485 



533 



581 



629 



677 



185 



190 
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GCA CAT TTT GGG GAA GGA TCT GGC CCA ATA TTG TTG GAT GAA GTA CGC 72 5 

Ala His Phe Gly Glu Gly Ser Gly Pro lie Leu Leu Asp Glu Val Arg 
200 205 210 

TGC ACC GGA AAC GAG CTG TCA ATT GAG CAA TGT CCA AAG AGT TCC TGG 77 3 

Cys Thr Gly Asn Glu Leu Ser lie Glu Gin Cys Pro Lys Ser Ser Trp 
215 220 225 

GGC GAA CAT AAC TGT GGC CAT AAA GAA GAT GCT GGA GTG TCT TGT GTT 821 
Gly Glu His Asn Cys Gly His Lys Glu Asp Ala Gly Val Ser Cys Val 
230 235 240 245 

CCT CTA ACA GAT GGT GTC ATC AGA CTG GCA GGA GGA AAA AGT ACC CAT 869 
Pro Leu Thr Asp Gly Val He Arg Leu Ala Gly Gly Lys Ser Thr His 
250 255 260 

GAA GGT CGC CTG GAG GTC TAC TAC AAG GGG CAG TGG GGG ACA GTC TGT 917 
Glu Gly Arg Leu Glu Val Tyr Tyr Lys Gly Gin Trp Gly Thr Val Cys 
265 270 275 

GAT GAT GGC TGG ACT GAG ATG AAC ACA TAC GTG GCT TGT CGA CTG CTG 965 
Asp Asp Gly Trp Thr Glu Met Asn Thr Tyr Val Ala Cys Arg Leu Leu 
280 285 290 

GGA TTT AAA TAC GGC AAA CAG TCC TCT GTG AAC CAT TTT GAT GGC AGC 1013 
Gly Phe Lys Tyr Gly Lys Gin Ser Ser Val Asn His Phe Asp Gly Ser 
295 300 305 

AAC AGG CCC ATA TGG CTG GAT GAC GTC AGC TGC TCA GGA AAA GAA GTC 1061 
Asn Arg Pro He Trp Leu Asp Asp Val Ser Cys Ser Gly Lys Glu Val 
310 315 320 325 

AGC TTC ATT CAG TGT TCC AGG AGA CAG TGG GGA AGG CAT GAC TGC AGC 1109 
Ser Phe He Gin Cys Ser Arg Arg Gin Trp Gly Arg His Asp Cys Ser 
330 335 340 

CAT AGA GAA GAT GTG GGC CTC ACC TGC TAT CCT GAC AGC GAT GGA CAT 1157 
His Arg Glu Asp Val Gly Leu Thr Cys Tyr Pro Asp Ser Asp Gly His 
345 350 355 

AGG CTT TCT CCA GGT TTT CCC ATC AGA CTA GTG GAT GGA GAG AAT AAG 1205 
Arg Leu Ser Pro Gly Phe Pro He Arg Leu Val Asp Gly Glu Asn Lys 
360 365 370 

AAG GAA GGA CGA GTG GAG GTT TTT GTC AAT GGC CAA TGG GGA ACA ATC 1253 
Lys Glu Gly Arg Val Glu Val Phe Val Asn Gly Gin Trp Gly Thr He 
375 380 385 

TGC GAT GAC GGA TGG ACC GAT AAG CAT GCA GCT GTG ATC TGC CGG CAA 13 01 

Cys Asp Asp Gly Trp Thr Asp Lys His Ala Ala Val He Cys Arg Gin 
390 395 400 405 

CTT GGC TAT AAG GGT CCT GCC AGA GCA AGG ACT ATG GCT TAT TTT GGG 13 49 

Leu Gly Tyr Lys Gly Pro Ala Arg Ala Arg Thr Met Ala Tyr Phe Gly 
410 415 420 

GAA GGA AAA GGC CCC ATC CAC ATG GAT AAT GTG AAG TGC ACA GGA AAT 1397 
Glu Gly Lys Gly Pro He His Met Asp Asn Val Lys Cys Thr Gly Asn 
425 430 435 
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GAG AAG GCC CTG GCT GAC TGT GTC AAA CAA GAC ATT GGA AGG CAC AAC 14 45 

Glu Lys Ala Leu Ala Asp Cys Val Lys Gin Asp lie Gly Arg His Asn 
440 445 450 

TGC CGC CAC AGT GAG GAT GCA GGA GTC ATC TGT GAC TAT TTA GAG AAG 14 9 3 

Cys Arg His Ser Glu Asp Ala Gly Val lie Cys Asp Tyr Leu Glu Lys 

455 460 465 

AAA GCA TCA AGT AGT GGT AAT AAA GAG ATG CTC TCA TCT GGA TGT GGA 15 41 

Lys Ala Ser Ser Ser Gly Asn Lys Glu Met Leu Ser Ser Gly Cys Gly 
470 475 480 485 

CTG AGG TTA CTG CAC CGT CGG CAG AAA CGG ATC ATT GGT GGG AAC AAT 15 89 

Leu Arg Leu Leu His Arg Arg Gin Lys Arg lie lie Gly Gly Asn Asn 

490 495 500 

TCT TTA AGG GGT GCC TGG CCT TGG CAG GCT TCC CTC AGG CTG AGG TCG 1637 

Ser Leu Arg Gly Ala Trp Pro Trp Gin Ala Ser Leu Arg Leu Arg Ser 
505 510 515 

GCC CAT GGA GAC GGC AGG CTG CTT TGT GGA GCT ACC CTT CTG AGT AGC 16 85 

Ala His Gly Asp Gly Arg Leu Leu Cys Gly Ala Thr Leu Leu Ser Ser 
520 525 530 

TGC TGG GTC CTG ACA GCT GCA CAC TGC TTC AAA AGG TAC GGA AAC AAC 17 3 3 

Cys Trp Val Leu Thr Ala Ala His Cys Phe Lys Arg Tyr Gly Asn Asn 

535 540 545 

TCG AGG AGC TAT GCA GTT CGA GTT GGG GAT TAT CAT ACT CTG GTC CCA 17 81 

Ser Arg Ser Tyr Ala Val Arg Val Gly Asp Tyr His Thr Leu Val Pro 
550 555 560 565 

GAG GAG TTT GAA CAA GAA ATA GGG GTT CAA CAG ATT GTG ATT CAC AGG 1829 

Glu Glu Phe Glu Gin Glu lie Gly Val Gin Gin lie Val lie His Arg 

570 575 580 

AAC TAC AGG CCA GAC AGA AGC GAC TAT GAC ATT GCC CTG GTT AGA TTG 18 77 

Asn Tyr Arg Pro Asp Arg Ser Asp Tyr Asp lie Ala Leu Val Arg Leu 
585 590 595 

CAA GGA CCA GGG GAG CAA TGT GCC AGA CTA AGC ACC CAC GTT TTG CCA 192 5 

Gin Gly Pro Gly Glu Gin Cys Ala Arg Leu Ser Thr His Val Leu Pro 
600 605 610 

GCC TGT TTA CCT CTA TGG AGA GAG AGG CCA CAG AAA ACA GCC TCC AAC 1973 

Ala Cys Leu Pro Leu Trp Arg Glu Arg Pro Gin Lys Thr Ala Ser Asn 

615 620 625 

TGT CAC ATA ACA GGA TGG GGA GAC ACA GGT CGT GCC TAC TCA AGA ACT 2021 

Cys His lie Thr Gly Trp Gly Asp Thr Gly Arg Ala Tyr Ser Arg Thr 
630 635 640 645 

CTA CAA CAA GCT GCT GTG CCT CTG TTA CCC AAG AGG TTT TGT AAA GAG 2 069 

Leu Gin Gin Ala Ala Val Pro Leu Leu Pro Lys Arg Phe Cys Lys Glu 

650 655 660 

AGG TAC AAG GGA CTA TTT ACT GGG AGA ATG CTC TGT GCT GGG AAC CTC 2117 

Arg Tyr Lys Gly Leu Phe Thr Gly Arg Met Leu Cys Ala Gly Asn Leu 
665 670 675 
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CAA\ GAA GAC AAC CGT GTG GAC AGC TGC CAG GGA 

GlnX^lu Asp Asn Arg Val Asp Ser Cys Gin Gly 
680 685 

CTC ATG\rGT GAA AAG CCT GAT GAG TCC TGG GTT 

Leu Met Cys Glu Lys Pro Asp Glu Ser Trp Val 

695 700 

TCC TGG GGG TAT GGA TGT GGA GTC AAA GAC ACT 

Ser Trp Gly Tyr Gly Cys Gly Val Lys Asp Thr 

710 \^ 715 720 

AGA GTC CCC GCT TT^\GTA CCT TGG ATA AAA AGT 

Arg Val Pro Ala Phe Val Pro Trp lie Lys Ser 

730 . 735 



GAC AGT GGA GGA CCA 
Asp Ser Gly Gly Pro 
690 

GTG TAT GGG GTG ACT 
Val Tyr Gly Val Thr 
705 

CCT GGA GTT TAT ACC 
Pro Gly Val Tyr Thr 
725 

GTC ACC AGT CTG 
Val Thr Ser Leu 
740 



TAACTTATGG AAAGCTCAAG AA^AGTAAA ACAGTAACTA TTCAGTCTTC AAAAAAAAAA 
AAAAAAAAAA 



2165 

2213 

2261 

2306 

2366 
2376 
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